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We found a unified formula for description of the household incomes 
of all society classes, for instance, of those of the European Union in year 
2007. This formula is a stationary solution of the threshold Fokker-Planck 
equation (derived from the threshold nonlinear Langevin one). The formula 
is more general than the well known that of Yakovenko et al. because it 
satisfactorily describes not only household incomes of low- and medium- 
income society classes but also the household incomes of the high-income 
society class. 
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1. Introduction 

In study of socio-economical systems, physics oriented approaches have 
widely been developed to explain different socio-economic processes [THE], 
t e-mail: zagielski@interia.pl 
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Those approaches aim at formulating well fitted unbiased indicators of social 
and economic phenomena. One of their key issues is the income of society 
analysis using methods of statistical physics, in particular, the stochastic 
dynamics considered as ab initio level. The main goal of this economic 
issue is to unravel and describe mechanisms of societies' enrichment or im- 
poverishment. 

In the recent decade, a large number of studies were performed aim- 
ing at constructing of models, which (to some extend) would well replicate 
the observed complementary cumulative distribution functions of individ- 
ual incomes. Among them, the most significant seems to be the Clementi- 
Matteo-Gallegati-Kaniadakis approach [9], the Generalized Lotka-Volterra 
Model [1HS], the Boltzmann-Gibbs law |10H13j . and the Yakovenko et al. 
model [21 13]. However, none of the above attempts to find an analytical 
description of the income structure solves the principal challenges, which 
concern: 

(i) the description of the annual household incomes of all society classes 
(including the third, i.e. the high-income society class) by a single 
unified formula based on the ab initio level and 

(ii) the problem regarding corresponding complete microscopic (microeco- 
nomic) mechanism responsible for the income structure and dynamics. 

In our considerations presented herein, we used Boltzmann-Gibbs law, 
weak Pareto law and Yakovenko et al. model to derive a uniform 
analytical formula describing all three society classes. 
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2. Extended Yakovenko et al. model 

In accord with an effort outlined above, we compared the empirical data 
of the annual household incomes in the European Union (EU), including 
Norway and Iceland, with predictions of our theoretical approach proposed 
herein. This approach is directly inspired by the Yakovenko et al. model. 
By using the generalised assumptions we extended this model to solve our 
principal challenges (i) and (ii) indicated above. 

We used data records from the Eurostat Survey on Income and Living 
Conditions (EU-SILC) [2], by way of example for year 2007 [15] (containing 
around 200 thousand empirical data points). However, these records con- 
tain only few data points concerning the high-income society class, i.e. the 
third region in the plot of the complementary cumulative probability distri- 
bution function vs. annual household income. To consider the high-income 
society class systematically, we additionally analysed the effective income 
of billionaireoj in the EU by using the Forbes 'The World's Billionaires' 



rank |16j . 

We were able to consider incomes of three society classes thanks to the 
following procedure. 

(i) Firstly, we selected EU billionaires' wealth from the Forbes rank, for 
instance, for two successive years 2006 and 2007. 

(ii) Secondly, we calculated their incomes for year 2007. This calcula- 



The term 'billionaire' used herein is equivalent (as in the US terminology) to the term 

'multimillionaire' used in the European terminology. Since we consider wealth and 

income of billionaires in euros, we recalculated US dollars to euros by using the mean 

exchange rate at the day of construction of the Forbes 'The World's Billionaires'. 
2 The billionaires who gained effective incomes are billionaires whose incomes are 

greater than zero. 
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tion was possible because we assumed that billionaires' incomes were 
proportional to the corresponding differences between their wealth for 
pair of successive years, here 2007 and 2006. Notably, we took into 
account only billionaires who gained effective incomes. 

(iii) Subsequently, having calculated incomes for the high-income society 
class, we joined them with the EU-SILC dataset. By using so com- 
pleted dataset, we then constructed the initial empirical complemen- 
tary cumulative distribution function for year 2007. For that, we used 
the well known Weibull recipe |17|J18|. However, this direct approach 
shows a wide gap of incomes inside the high-income society class re- 
sulting in a horizontal line of the complementary cumulative distribu- 
tion function. This gap separates the first segment belonging to the 
high-income society class, consisting of all data points taken from the 
EU-SILC dataset, from the second segment, consisting of remaining 
data points, which also belong to the high-income society class but 
are taken from the Forbes dataset. 

(iv) In the final step, we eliminated this gap by adopting the assumption 
that the empirical complementary cumulative distribution function 
(concerning the whole society) have no horizontal segments. That is, 
we assumed that statistics of incomes is a continuous function of in- 
come (i.e. it has no disruption). Hence, we were forced to multiply the 
billionaire incomes from Forbes dataset by the properly chosen com- 
mon proportionality factor. This factor was equal to 1.0 x 10 -2 , as we 
assumed the requirement of full overlap of the first (above mentioned) 
segment by the second segment. This assumption leads to a unique so- 
lution (up to some negligible statistical error) for this proportionality 
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factor. We found that this factor was only a slowly- varying function 
of time (or years) . 

Hence, we received data record containing already a sufficient number of 
data points for all society classes, including the high-income society class. 
Although the Forbes empirical data only roughly estimate the wealth of 
billionaires, they quite well establish the billionaires' rank, thus sufficiently 
justifying our approach. This is because our purpose is to classify billionaires 
to concrete universality class rather than finding their total incomes. 

The basic tool of our analysis is an empirical complementary cumulative 
distribution function being typical in this context. We calculated it accord- 
ing to the standard two-step procedure based on the well known Weibull 
formula [171118]. The complementary cumulative distribution function ob- 
tained that way is sufficiently stable and it does not reduce the size of the 
output compared to that of the original empirical data record. 

Let m be an influx of income per unit time to a given household. We 
treat m as a variable obeying stochastic dynamics. Then, we can describe its 
time evolution by using the nonlinear Langevin stochastic dynamics equa- 
tion [2l[3l[T9j. Hence, this Langevin equation is equivalent to the following 
Fokker-Planck equation for the probability distribution function (in the ltd 
representation) [19]: 

§^P(m,t) = -^[A(m)P(m,t)] + -^[B(m)P(m,t)}. (1) 

Here, A{m) is a drift coefficient and B{m) = C 2 (m)/2, where the coefficient 
C(m) is the m-dependent amplitude of a temporal white noise; they together 
play a fundamental role in the Langevin equation as a stochastic force. The 
quantity P(m, t) is the temporal income distribution function. In general, 
coefficients A{m) and B(m) can be additionally determined by the first and 
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second moment of the income change per unit time, respectively, only if 
these moments exist. Subsequently, the equilibrium solution of Eq. (TTJ), 
-P e qi takes the form: 

„ . . const ( f m A(m') , A 
P ( m = — — exp - / —7 ' dm' 2 
cqV ' B(m) y \ J minit B{m') J V ' 

where m; n i t is the lowest household income and const is a normalisation 
factor. Indeed, this expression is exploited in this work. 

Following the Yakovenko et al. model (21131) we can assume that changes 
of income of the low-income society class are independent of the previous 
income gained. This assumption is justified because the income of house- 
holds belonging to this class mainly takes the form of wages and salaries. 
The stochastic process associated with the mechanism of this kind is called 
the additive stochastic process. In this case, coefficients A{m) and B(m) 
take, obviously, the form of positive constants 

A(m) = A , B{m) = B . (3) 

This choice of coefficients leads to the Boltzmann-Gibbs law with exponen- 
tial complementary cumulative distribution function [21151 1 1UIH3] : 

H(m) = £° P eq (m') dm' = exp (- m ~J" init ) . (4) 

In Equation distribution function is characterised by a single parameter, 
i.e. an income temperature T = Bq/Aq, which can be interpreted in this 
case as an average income per household. 

For the medium- and high-income society classes, we can assume (again 
following Yakovenko et al. J2j[3] ) that changes of income are proportional to 
the income gained so far. This assumption is also justified because profits go 
to the medium- and high-income society classes mainly through investments 
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and capital gains. This type of stochastic process is called the multiplicative 
stochastic process. Hence, coefficients A(m) and B(m) obey the proportion- 
ality principle of Gibrat [201I2T] : 

A(m) = am, B(m) = bm 2 43- C(m) = V2bm, (5) 

where a and b are positive parameters. By using the equilibrium distribu- 
tion function, Eq. ([2]), we arrive in this case to the weak Pareto law with 
complementary cumulative distribution function [2JE1E]: 

n(m) = r P eq (m') dm' = (—) ° . (6) 



Here, m s is a scaling factor (depending on a, b, and const) while a = 1 + a/b 
is the Pareto exponent. The ratio of the a to b parameters can directly be 
determined from the empirical data expressed in the log-log plot (by using 
their slopes). 

As Yakovenko et al. have already found 0[3], the coexistence of additive 
and multiplicative stochastic processes is allowed. By assuming that these 
processes are uncorrelated, we get 

A(m) = A + am, B(m) = B + bm 2 = b (ml + m 2 ), (7) 

where m^ = B^jb. This consideration leads (together with Eq. §Z§) to a 
significant Yakovenko et al. model with the probability distribution function 
given by 

g— (mo/T) arctan(m/mo) 

Peq(m) = C ° nSt [l + (m / m o) 2 ]W (8) 
where parameters a and T are defined above. 

Based on the Yakovenko et al. Eq. ([8|), the complementary cumulative 
distribution function can describe income of only low- and medium-income 
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society classes. However, it does not capture that of the most intriguing 
high-income society class. 

The goal of our present work is to derive from Eq. ([2]) such a distribution 
function, which would cover all three ranges of the empirical data records, 
i.e. low-, medium, and high-income classes of the society (including also two 
short intermediate regions between them). To do that, we have to provide 
function A(m) in the threshold form: 



At the threshold mi, there is a jump of the proportionality coefficient of 
the drift term. That is, this term abruptly changes from a to a' while the 
formalism of the income change remains the same for the whole society. This 
formalism is expressed by the threshold nonlinear Langevin equation where 
particular dynamics distinguishes the range of the high-income society class 
from those of the others. 

The threshold parameter mi can be interpreted as a crossover income 
between the medium- and high-income society classes. Remarkably, both in- 
come crossovers mo and mi(> mo) are exogenous parameters. They should 
be determined from the dependence of the empirical complementary cu- 
mulative distribution function on variable m because both crossovers are 
sufficiently distinct. 

Subsequently, by substituting Eq. ([9]) into Eq. (j2J), we finally get 




A < (m) = Aq + am if m < m\ 
A- (m) = A' + a' m if m > mi , 




(9) 



Peq{m) 




(10) 
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where a\ = 1 + a! jb and T\ = Bq/A' . Apparently, the number of free 
(effective) parameters driving the two-branch distribution function, Eq. dlOp . 
is reduced because this function depends only on the ratio of the initial 
parameters defining the nonlinear Langevin dynamics. 

For m\ 3> mo, the interpretation of the distribution function, Eq. (|10|) . 
is self-consistent, as required, because the two power-law regimes are well 
defined. Then, for instance for m 3> 7?7o, the second branch in Eq. (|10p be- 
comes the power-law dependence driven by the Pareto exponent a\ different 
(in general) from a. 

Importantly, our analysis indicates that the existence of the third income 
region is already allowed by theory. We are following this indication below. 

3. Results and discussion 

In principle, we are ready to compare the theoretical complementary 
cumulative distribution function based on our probability distribution func- 
tion P cq (m), given by Eq. (|10p . with the empirical data for the whole 
income range. However, the analytical form of this theoretical complemen- 
tary cumulative distribution function is unknown in the closed explicit form. 
Therefore, we calculate it numerically. The key technical question arises on 
how to fit this complicated theoretical function to the empirical data. The 
fitting procedure consists of three steps as, fortunately, all parameters are 
to be found (in principle) by using independent fitting routines, as follows. 

In an initial step, we found approximated values of crossovers rriQ and 
mi directly from the plot of the empirical complementary cumulative dis- 
tribution function (or empirical data). Thus, uncertainty of the mo and mi 
parameters did not exceed 10%, which was sufficiently accurate. Moreover, 
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we took the exact value of the parameter mjnit as the first point in the record 
of the empirical data. 

Secondly, we determined the temperature T value by fitting the Boltzmann- 
Gibbs formula, Eq. to the corresponding empirical data in the range 
extending from m; n i t to mo (both found in the initial step). Notably, we as- 
sumed that this formula could be characterised by a single temperature value 
since the society as a whole was considered to be in (partial) equilibrium 
during the whole fiscal year. That is, we further put T\ = T <^ A' Q = Aq. 

At the third step, we determined exponents a and a\ by separately 
fitting the weak Pareto law to the empirical data for the medium- and high- 
income society classes, respectively. 

Hence, we have already obtained all values required by the extended 
Yakovenko et al. formula, Eq. (|l(jp . The corresponding plots of the empiri- 
cal and theoretical complementary cumulative distribution functions in the 
log-log scale are compared in Fig. [IJ for instance, for year 2007. Apparently, 
the predictions of the extended Yakovenko et al. formula, Eq. (|10|) . (solid 
curve in Fig. [1]) well agree with the empirical data (dots in Fig. [I]) for low- 
and medium-income society classes while agreement for the high-income 
society class is satisfactory. 

4. Concluding remarks 

Herein, we proved that the household incomes of all society classes in 
the EU can be modelled by the nonlinear threshold Langevin dynamics with 
m-dependent drift and dispersion as ab initio level. At the threshold mi, 
there is a jump of the proportionality coefficient of the drift term. That 
is, this term abruptly changes from a to a', where a' < a (as a\ < a). It 
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Fig. 1. Fit of the complementary cumulative distribution function, based on the 
extended Yakovcnko et al. formula, Eq. (|TU|) . (solid line) to the EU household 
income empirical data set (dots) for year 2007 (T x = T 2 = T = 37 x 10 3 ± 1 x 10 3 
EUR, too = 1.60 x 10 5 ± 0.16 x 10 5 EUR, m 1= 3x 10 5 ± 0.3 x 10 5 EUR, a = 
2.8643 ± 0.0008, and ai = 0.70 ± 0.02) HSJQi]. 

means that the stochastic term in the Langevin equation is relatively more 
significant in this case (i.e. above threshold mi) than the drift term. 

Furthermore, for the medium-income society class the Pareto exponent 
a > 2. This means that the variance of the Pareto distribution function 
exists and it is finite. However, for the high-income society class the variance 
of the Pareto distribution function is infinite, because a.\ < 1. That is, 
assuming the variance as a measure of a risk, the economic activity of the 
high-income society class can be considered as more risky than activities of 
all other society classes, as expected [T]. 

The completed database, which we used (by properly joining the Forbes 
empirical database with that of EU-SILC), emphasises a significant role of 
the high-income society class. That is, only study of the income of all society 
classes enables adequate characterisation of the relative society wealth. 
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